What has been done

This section shows the main steps that have been applied to pre-process the raw data.

aCDOM spectra

  • The CDOM spectra were modeled according to the information in Babin 2003.

    • acdom spectra were re-fitted using the complete data (i.e. between 350-500 nm).
  • Average background values calculated between 683-687 nm.

  • Some files were in binary format, so I could not open them (ex.: C2001000.YSA).

  • Some spectra start at 300 nm while others at 350 nm.

  • Calculated the correlation between the measured and the fitted values.

    • Fits with R2 than 0.99 were removed from the data.
  • Exported the complete spectra (350-700 nm): both the raw and the modeled data.

Phytoplankton and non-algal absorption

  • Original data had average background calculated between 745 and 750 nm. This average background have been re-added to the spectra. Thereafter, a new averaged background values were calculated between 746 and 750 nm and subtracted from the spectra.

Irradiance

  • There were negative values in the irradiance data (Ed, Eu, Kd, Ku). I have cleaned the data by setting these negative values to NA.

    • To validate

Reflectance

  • Reflectance values outside the 0-1 range were set to NA.

Other stuff

  • Extracted extra variables (DOC, AQY) from Massimo 2000.

Visualizations

Just some graphs to visualize the data. Note that the same color palette will be used to represent the areas in all graphics.

Geographical map

There is a total of 424 different stations that were sampled during the COASTLOOC expeditions.

Available variables

Just an overview of the available variables (excluding radiometric measurements).

Absorption measurements

Overview of the averaged absorption spectra for each area. The acdom spectra have been refitted from the original/raw spectra corrected with the new 746-750 nm background.

Comparing acdom443 for the different areas shows that there is a clear open to coastal gradient.

We can see that the DOC follows the same pattern as acdom443.

We can also use scatter-plots to further explore the relationships among variables.

Relationships between some pigments.

TODOS

  • There are two stations without geographical coordinates: C2001000, C2002000.

  • Find a good way to flag the data:

    • For example, all a_phy, a_nap, a_tot spectra contain negative absorption values, but this does not mean they are bad spectra.
  • There are a lot of nutrient parameters that have values of zero. Are they true zero or indicate missing values?

aCDOM

In Babin 2003, it is said:

A baseline correction was applied by subtracting the absorbance value averaged over a 5-nm interval around 685 nm from all the spectral values.

In the data, there is a variable called y_model_intercept. Is this variable really a value derived from a model? I think it is more the average value calculated between 683-687 nm. If I am right, y_model_intercept should be renamed to background_a_cdom_average_683_687.

aphy and atot

In Babin 2003, it is said:

…from all the measured spectral values of ap(l) and aNAP(l), respectively (to be exact, the averages of the measured values between 746 and 750 nm were subtracted).

However, I was told that the background values were calculated between 745-750 nm. I calculated new background values between 746-750 nm. If I compare both background values, they fit perfectly on the 1:1 line.

  • There are some stations in the raw acdom data (ex.: C5006066) that have no entries in SurfaceData5(C4corr).txt. That means that we do not have coordinates nor area for stations like C5006066. The next list shows all the acdom stations without metadata.
## # A tibble: 14 x 1
##    station 
##    <chr>   
##  1 C5006066
##  2 C5007015
##  3 C5008009
##  4 C5009015
##  5 C5015012
##  6 C5030023
##  7 C5033015
##  8 C5034013
##  9 C5035012
## 10 C5036013
## 11 C5037013
## 12 C5049017
## 13 C5050020
## 14 C5053025

Variable naming

My understanding is that background variables in the original data should be renamed as follow (validate if 745-750 nm or 746-750 nm):

  • background_a_cdom_average_683_687 = y_model_intercept
  • background_a_phy_average_745_750 = back_pga
  • background_a_nap_average_745_750 = back_dta
  • background_a_tot_average_745_750 = back_toa

Irradiance data

  • Eu means that Eu0- was estimated (see with Marcel, I do not remember what it means).

  • Ed is Ed0- calculated from 0.96 x ed0+.

    • Change the name in the data files ed to ed0-.
  • Do we simply set negative vlaues to NA or completly remove the spectral profile? For example, we can look at the ed values for station A2008000.

station wavelength eu ed ku kd
A2008000 411 2.500 15.693 -95.000 0.095
A2008000 443 2.800 24.754 -95.000 0.083
A2008000 456 3.300 30.747 -95.000 0.077
A2008000 490 3.200 26.328 -95.000 0.070
A2008000 509 NA 25.887 -95.000 0.082
A2008000 532 2.300 21.366 -95.000 0.091
A2008000 559 1.600 19.003 -95.000 0.106
A2008000 619 0.260 13.520 0.376 0.324
A2008000 665 0.154 16.091 0.301 0.445
A2008000 683 0.287 13.657 0.205 0.494
A2008000 705 0.051 7.725 0.225 0.654
A2008000 779 NA 3.911 -95.000 NA
A2008000 866 NA -2.781 -95.000 NA

AC9

  • a(715) is always at 0.

  • There are negative values.

Orientation of the paper

  • The data is a mix of temporal and spatial observations, so how should we present the data?

  • By area?